The Twofish Encryption Algorithm: A 128-Bit Block Cipher The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson
Wiley Computer Publishing, John Wiley & Sons, Inc.
ISBN: 0471353817   Pub Date: 03/01/99
  

Previous Table of Contents Next


Chapter 6
Twofish Design Philosophy

In the design of Twofish, we tried to stress the following principles:

Performance. When comparing different options, compare them on the basis of relative performance.
Conservativeness. Do not design close to the edge. In other words, leave a margin for error and provide more security than is provably required. Also, try to design against attacks that are not yet known.
Simplicity. Do not include ad hoc design elements without a clear reason or function. Try to design a cipher whose details can be easily kept in one’s head.

These principles were applied not only to the overall design of Twofish, but to the design of the S-boxes and the key schedule.

6.1 Performance-Driven Design

The goal of performance-driven design is to build and evaluate ciphers on the basis of performance [SW97]. The early post-DES cipher designs would often compete on the number of rounds in the cipher. The original FEAL paper [SM88], for example, discussed the benefits of a stronger round function and fewer rounds. Other cipher designs of the period—REDOC-II [CW91], LOKI89 [BPS90] and LOKI91 [BKPS93], IDEA [LM91, LMM91]—only considered performance as an afterthought. Khufu/Khafre [Mer91] was the first published algorithm that explicitly used operations that were efficient on 32-bit microprocessors; SEAL [RC94, RC98] is a more recent example. RC2 [Riv97, KRRR98] and Jeroboam [CM98] were designed for 16-bit microprocessors, SOBER [Ros98] for 8-bit ones. Other, more recent designs, do not seem to take performance into account at all. Two 1997 designs, SPEED [Zhe97]1 and Zhu-Guo [ZG97], are significantly slower than alternatives that existed years previous.


1SPEED has been cryptanalyzed in [HKSW98, UTK98, HKR+98].

Arbitrary metrics, such as the number of rounds, are not good measures of performance. What is important is the cipher’s speed: the number of clock cycles per byte encrypted. When ciphers are analyzed according to this property, the results can be surprising [SW97]. RC5 might have twice the number of rounds of DES,2 but since its round function is more than twice as fast as DES’, RC5 is faster than DES on most microprocessors.


2Here we use the term “round” in the traditional sense: as it was defined by DES [NBS77] and has been used to describe Feistel-network ciphers ever since. The RC5 documentation [Riv95] uses the term “round” differently: one RC5-defined round equals two Feistel rounds.

Even when cryptographers made efforts to use efficient 32-bit operations, they often lacked a full appreciation of low-level software optimization principles associated with high-performance CPUs. Thus, many algorithms are not as efficient as they could be. Minor modifications in the design of Blowfish [Sch94], SEAL [RC94, RC98], and RC4 [Sch96] could improve performance without affecting security [SW97] (or, alternatively, increase the algorithms’ complexity without affecting performance). In designing Twofish, we tried to evaluate all design decisions in terms of performance.

Since NIST’s platform of choice was the Intel Pentium Pro [NIST97b], we concentrated on that platform. However, we did not ignore performance on other 32-bit CPUs, as well as 8-bit and 16-bit CPUs. If there is any lesson from the past twenty years of microprocessors, it is that the high end gets better and the low end never goes away. Yesterday’s top-of-the-line CPUs are currently in smart cards. Today’s CPUs will eventually be in smart cards, while the 8-bit microprocessors will move to devices even smaller. The only thing we did not consider in our performance metrics is bitslice implementations [Bih97, SAM97, NM97], since these can only be used in very specialized applications and often require unrealistic implementations; e.g., 32 simultaneous ECB encryptions, or 32 interleaved IVs.3


3One AES submission, Serpent [BAK98], uses ideas from bitslice implementations to create a cipher that is optimized for 32-bit processors while sacrificing performance on 8-bit processors.

6.1.1 Performance-driven Tradeoffs

During our design, we constantly evaluated the relative performance of different modifications to our round function. Twofish’s round function encrypts at about 20 clock cycles; 16 rounds translates to about 320 clock cycles per block encrypted. When we contemplated a change to the round function, we evaluated it in terms of increasing or decreasing the number of rounds to keep performance constant. For example:

  We could have added a data-dependent rotation to the output of the two MDS matrices in each round. This would add 10 clock cycles to the round function on the Pentium (two on the Pentium Pro). To keep the performance constant, we would have to reduce the number of rounds to 11. The question to ask is: Are 11 rounds of the modified cipher more or less secure than 16 rounds of the unmodified cipher?
  We could have removed the one-bit rotation. This would have saved clocks equivalent to one Twofish round. Are 17 rounds of this new round function more or less secure than 16 rounds of the old one?
  We could have defined the key-dependent S-boxes using the whole key, instead of half of it. This would have doubled key setup time on high-end machines, and halved encryption speed on memory-poor implementations (where the S-boxes could not be precomputed). On memory-poor machines, we would have to cut the number of rounds in half to be able to afford this. Are 8 rounds of this improved cipher better than 16 rounds of the current design?

This analysis is necessarily dependent on the microprocessor architecture the algorithm is being compared on. While we focused on the Intel Pentium architecture, we also tried to keep 8-bit smart card and hardware implementations in mind. For example, we considered using an 8-by-8 MDS matrix over GF(24) to ensure a finer-grained diffusion, instead of a 4-by-4 MDS matrix over GF(28); the former would have been no slower on a Pentium but at least twice as slow on a low-memory smart card.

6.2 Conservative Design

In the last decade there has been considerable research in designing ciphers to be resistant to known attacks [Nyb91 Nyb93, OCo94a, OCo94b, OCo94c, Knu94a, Knu94b, Nyb94, DGV94b, Nyb95, NK95, Mat96, Nyb96], such as differential [BS93], linear [Mat94], and related-key cryptanalysis [WH87, Bih94, KSW96, KSW97]. This research has culminated in strong cipher designs—CAST-128 [Ada97a] and MISTY [Mat97] are probably the most noteworthy—as well as some excellent cryptanalytic theory.


Previous Table of Contents Next